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Abstract — In this paper, we consider the delay-sensitive power 
and transmission threshold control design in S-ALOHA network 
with FSMC fading channels. The random access system consists 
of an access point with K competing users, each has access to the 
local channel state information (CSI) and queue state information 
(QSI) as well as the common feedback (ACK/NAK/CoUision) 
from the access point. We seek to derive the delay-optimal 
control policy (composed of threshold and power control). The 
optimization problem belongs to the memoryless policy ii'-agent 
infinite horizon decentralized Markov decision process (DEC- 
MDP), and finding the optimal policy is shown to be computation- 
ally intractable. To obtain a feasible and low complexity solution, 
we recast the optimization problem into two subproblems, namely 
the power control and the threshold control problem. For a 
given threshold control policy, the power control problem is 
decomposed into a reduced state MDP for single user so that the 
overall complexity is 0{NJ), where and J are the buffer size 
and the cardinality of the CSI states. For the threshold control 
problem, we exploit some special structure of the collision channel 
and common feedback information to derive a low complexity 
solution. The delay performance of the proposed design is shown 
to have substantial gain relative to conventional throughput 
optimal approaches for S-ALOHA. 

Index Terms — S-ALOHA, delay, Markov decision process 
(MDP), local channel state information (CSI), local queue state 
information (QSI), threshold control, power control. 

I. Introduction 

Random access network is a hot research topic due to 
its robustness in system performance. In particular, ALOHA 
is a popular example of random access protocol which has 
attracted a lot of research attention over the past two decades. 
One important application is the access network (such as the 
infrastructure mode in WiFi) where multiple nodes compete 
for transmission opportunity to transmit data to an access point 
(AP). In [1], the authors considered the design and analysis 
of the traditional buffered slotted ALOHA (S-ALOHA) in 
which finite users with infinite buffer attempt to transmit a 
backlogged packet according to a transmission probability 
in one slot, and the packet is successfully received if and 
only if exact one packet is transmitted. In asymmetric net- 
work (heterogenous users), the stability region has only been 
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obtained in two and three user cases [2]. The study of the 
stability region for general number of users is difficult because 
the transition probability of the state space of the interacting 
queues alters from the non-empty to empty buffer case. In [3], 
the authors proposed a dominant system technique to obtain 
a lower bound for the stability region for the general case. In 
symmetric ALOHA network (homogeneous users), all users 
are statistically identical and hence, the stability region is 
degenerated to one dimension. It is shown in [1], [4] that 
the system is stable as long as the arrival rate is less than the 
average throughput. As a result, stability analysis is equivalent 
to the throughput analysis. The authors in [4] extended the 
protocol to an adaptive ALOHA over the multi-packet recep- 
tion (MPR) channel to maximize the system throughput. For 
instance, the transmission probability is a function of the local 
channel state information (CSI). In [5], the authors extended 
to the adaptive transmission rate and power control w.r.t to 
CSI to maximize the throughput. In [6], it is shown that a 
simple adaptive permission probability scheme, namely binary 
scheduling, is throughput optimal for homogeneous users with 
adaptive transmission rate in collision channel. In the binary 
scheduling scheme, there is a transmission threshold in which 
user could attempt to transmit its backlogged packet only when 
its local CSI exceeds the threshold. 

In all the above works on stability and throughput analysis 
and optimization, the delay performance has been ignored 
completely. In practice, applications are delay-sensitive and 
it is critical to optimize the delay performance in S-ALOHA 
network to support realtime applications. In [7], the authors 
surveyed the recent works on delay analysis of traditional S- 
ALOHA network in which exact delay can be obtained only 
in two user case. In [8], the delay performance for finite user 
finite buffer is analyzed using the tagged user analysis (TUA) 
method. Although the channel fading is considered, adaptive 
transmission probability and rate with power control is not 
allowed. In [9], the trade-off between delay and energy in 
additive write Gaussian noise (AWGN) channel with no queue 
state information (QSI) is investigated. However, they assumed 
multi-access coding to ensure successful reception for each 
user even if all competing users transmit simultaneously. In 
[10], the authors proved that the longest queue highest possible 
rate (LQHPR) policy, which is a centralized control policy 
requiring perfect knowledge of global QSI and global CSI, is 
delay-optimal in symmetric network. While the above works 
deal with the delay performance of S-ALOHA network, there 
are still a lot of technical challenges to be solved. They are 
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listed below. 

• Queue-aware power and threshold control for S- 
ALOHA: Previous literature focused either on the power 
control (under a fixed and common threshold for all users) 
for throughput optimization, or on the delay analysis of 
uncontrolled S-ALOHA network. Both the transmission 
threshold control and power control policies are important 
means to optimize the delay performance of S-ALOHA. 
However, due to the lack of global knowledge on CSI 
and QSI, it is quite challenging to design delay-sensitive 
control schemes for S-ALOHA networks. 

• Exploiting memory in the fading channels: Existing 
works have assumed memoryless adaptation in which 
the control actions are done independently slot by slot 
(assuming fading is i.i.d). While i.i.d fading could lead 
to simple solution, it fails to exploit the memory of the 
time varying fading channels, which is critical to boost 
the delay performance of S-ALOHA network. 

• UtiUzation of local QSI and common feedback in- 
formation from the AP: Existing control policy on 
throughput optimization only adapts to the local CSI and 
did not exploit the local QSI as well as common feedback 
information from the AP. These side information are 
also critical to improve the delay performance of the S- 
ALOHA network. 

In this paper, we shall propose a delay-sensitive power 
and transmission threshold control algorithm for S-ALOHA 
network which addresses the above three important issues. 
We consider a S-ALOHA network with K users. The trans- 
mit power and threshold control policies adapt to the local 
CSI, local QSI as well as common feedback information 
(ACK/NAK/Collision) from the AP. The delay-optimization 
problem belongs to the memoryless policy K-agent infinite 
horizon decentralized Markov decision process (DEC-MDP) 
[II]. The problem of finding the optimal policy is proved to 
be NP-hard [12], [13], which means that the optimal solution 
is computationally intractable. To obtain a feasible and low 
complexity solution, we recast the optimization problem into 
two subproblems, namely the power control and the threshold 
control problem. For a given threshold control policy, the 
power control problem is decomposed into a reduced state 
MDP for single user so that the overall complexity is 0{NJ'^), 
where N and J are the buffer size and the cardinality of 
the CSI states. On the other hand, we solve the threshold 
control problem by exploiting the special structure of the S- 
ALOHA network and common feedback information to derive 
a low complexity solution. The delay performance of the 
proposed design is shown to have substantial gain relative to 
conventional solutions. 

This paper is organized as follows. In section lUl we outline 
the system model of S-ALOHA network and define the delay- 
optimal control policy. In section |III1 we shall formulate the 
delay-optimal problem and introduce the DEC-MDP model. 
In section |IV] we exploit the special structure in symmetric 
network. We also extend to asymmetric case in section [Vl and 
illustrate the performance via simulations in section [Vll A 
brief summary is given in section IVIII finally. 
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Fig. 1. The system model in symmetric S-ALOHA network. 



II. System Model 

In this section, we shall elaborate the system model, includ- 
ing source and physical layer model, as well as the control 
policy in symmetric network, and extend to the asymmetric 
case in section |V] We consider a K users S-ALOHA network 
in this paper. The time dimension is partitioned into slots (each 
slot lasts T seconds). The m-th slot means the time interval 
(mr, (m + l)T), m = 0, 1, 2 • • • . Fig. [T] illustrates the top level 
system model in symmetric network. The K competing users 
are coupled together via the transmission threshold and power 
control poUcy. 



A. Source Model 

For simplicity, the arrival packet rate of all the users 
is assumed to follow independent Poisson distribution with 
arrival rates A (number of packets per second). The packet 
length of the data source iV;,, follows exponential distribution 
with mean packet size Ni, (bits per packet), and the buffer 
size is N (packets). The QSI of the whole system at the 
m-th slot is denoted by Q,„ = {Qk,m}k=i ^ , where 
Qk.m is the number of packets in the /c-th user's buffer, and 
M — {0, 1, 2, N} denotes a finite state space of local QSI 
for single user. When the buffer is full, i.e, Qk,m = N, it will 
not accept any potential new packets. 



B. Physical Layer Model and Feedback Mechanism 

We consider a block fading channel between each user 
and the AP. The CSI at m-th slot is denoted by Hm = 
{Hk,m]k=i ^ where Hk^m is the channel gain for 

user k, and S = {Si]'l^i denote a set of J CSI states for 
single user. {Hk^m]m=i i^ modeled as a stationary ergodic 
process [14], which is independent among users. Specifically, 
let Pi J = Pr{i7A;,m = Sj\Hk^„i-i = Si} be the state 
transition probability and tt^ = VY{Hk.oQ = Sj} be the 
stationary probability. All the users share a common spectrum 
with a bandwidth of WHz using S-ALOHA protocol. The 
signal received by the AP at m-th slot is given by: 



y[m] = y/ Hk,mXk[m] + z[m] 



(1) 



where a;fc[m] is the transmit signal for the k-th user at m-th 
slot, and {^;[m]}m=i is the i.i.d 7V(0, Nq) noise. Suppose that 
only the fc-th user attempts to transmit its packet to the AP at 
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the m-th slot. The maximum achievable data rate (b/s) of the 
A:-th user is given by: 

R{Pk,,n, Hk,„r) = Wlog^ (^1 + ^"^1^^-"' ^ (2) 

where Pfc m and Hk^m is the power and channel gain of k-th 
user at m-th slot. 

To decouple the delay-optimal design from the detailed 
implementation of the modulation and coding in the physical 
layer, we assumed that the data rate dU is achievable. In fact, 
it has been shown [15] that the Shannon's limit in (|2|i can 
be achieved to within 0.05dB SNR using LDPC with 2K 
byte block size at 1% PER. We consider a collision channel 
for the S-ALOHA random access and hence, the AP could 
only decode the data successfully when there is only one user 
transmitting in any time slot. At the end of each slot, the 
AP broadcasts the ACK/NAK/Collision feedback, denoted as 
Z — (1,0, e) [16], to all the K users in the network. For 
instance, ACK (Z = 1) means that exactly one user has 
transmitted the packet, and data was successfully decoded; 
NAK {Z — 0) means that none of users has transmitted and 
hence, no data was received; Collision (Z = e) means that at 
least two users have transmitted, and the data was corrupd 

C. Control Policy 

Each user decides whether to transmit a packet at the 
beginning of a slot using a threshold mechanism. Due to 
symmetry, a user will transmit if the buffer is not empty and 
its local CSI exceeds a common system threshold 7^0 If there 
are more than one backlogged users' local CSI exceeding the 
threshold, then collision will occur and none of the packets 
could get through. As a result, 7™ determines the priority on 
the access opportunity of each user In this paper, we shall 
consider an adaptive threshold control to exploit the fading 
memory to minimize the system delay. A stationary threshold 
control policy tt^ is defined below: 

Definition 1 (Stationary Threshold Control Policy): A 
stationary threshold control policy ii^ : 5 x Z — > 5 is defined 
as the mapping from the previous slot's system threshold 
7m_i and common feedback Z^-i from the AP to the 
system threshold TT^{'-fm-i, Zm-i) — 7m in current slot. 
The set of all feasible stationary policies tt^ is denoted as 
•p^ = {tt^ : 7r^(7m-i, -^m-i) e S}. 

The threshold control is adaptive to the common information 
for all the K users and hence, each user could determine the 
system threshold just from the feedback from the AP 

'since we assume strong coding is used by each user, we ignore the case 
with transmission error. 

-In symmetric network, users ai'e statistically identical (e.g. same fading 
channel, same airival packet rate and same average power constraint) and a 
common threshold is reasonable for fairness consideration (achieving the same 
average delay performance). On the other hand, for the asymmetric network, 
we have considered the flexibility of different thi'esholds for different users 
(because the users are not statistically identical anymore). 

'We have assumed the deterministic threshold control policy here. In fact, 
the same formulation and approach can be used to deal with a transmission 
probability approach rather than threshold approach. The users will transmit in 
a probability at different CSI state according to a probability function ip{H) G 
[0,1]. The transmission control policy is defined as 7r^(i^m-l, -^m-i) = 
ipm, i.e, mapping from the common information to current slot's transmission 
probability function. 



Denote Xm = {Q™, H,„_i, 7,„_i, Z,„_i, H,„} to be 
the global system state at the m-th slot and Xk,m = 
{Qk,m, Hk,m-i,lTn-i, Zm-i, Hk,m} to be the local system 
state which is observable locally at the fc-th user. Note that 
{jrri-i, Zm-i} is the common information for all users, and 
{Qk,m, Hk^m-i, Hk^m} is the local information for the fc-th 
user Given the observed local system state realization Xk,m, 
the fc-th user should adjust the transmission power according 
to a stationary power control policy up, which is formally 
defined below. 

Definition 2 (Stationary Power Control Policy): The sta- 
tionary power control policy for single user Tip : N x. S x 
iSxZxiS^Mis defined as the mapping from current local 
system state for fc-th user, to current slot's transmit power 
Tip{Xk.m) — Pk.m- The sct of all feasible stationary policies 
Tip is defined as Vp = {up : Tip{xk,m) > 0}. Note that 
Pk,m — for all iJfc.m < 7„i, because current slot's CSI is 
lower than the threshold. 

For simplicity, let tt ~ {7r-y,7rp} denote the joint control 
policy of all the K users. The corresponding set of station- 
ary joint control policy is given hy V = {V^^Vp} . As 
a result, 7r(Xm) = {7r7(7m-i, ^m-i), {7rp(xfc,m)}f=i} = 

{lm,{Pk,.^}L^}■ 

In practice, the user with empty buffer will not transmit 
even if its local CSI exceeds the system threshold, and this 
is one important technical challenge in the delay analysis of 
S-ALOHA network. Instead of dealing with the delay for the 
original S-ALOHA network, we shall utilize the technique of 
dominant system [3] to obtain an upper bound of the delay 
performance. In the dominant system, we assume users always 
have virtual packets to send (even if the buffer is empty) and 
therefore, the delay performance associated with the dominant 
system is always an upper bound of the actual system. Yet, 
the bound is asymptotically tight in the large delay regime. 

III. Problem Formulation 

In this section, we shall first formulate the delay-optimal 
control policy problem, and then formally introduce DEC- 
MDP model. We show that our problem belongs to the 
memoryless policy case of DEC-MDP in which finding the 
optimal policy is computationally intractable. 

A. System Delay 

Due to the nature of random access, the queues of the K 
users are coupled together via the control policy. When the 
system threshold is small, there will be a high probability of 
having more than one users sending packet, leading to collision 
and wastage of power resource. On the other hand, when the 
system threshold is high, there is non-negligible probability of 
having no user sending packet, leading to wastage of idle time. 
Similarly, individual user may want to increase the transmit 

''when transmission probability approach is applied. The 
local system state for the power control policy should be 
Xk,m = {Qk,,n,Hk,m-i,'Pm-l{H),Zm-i,Hk^m}- We further 
discretize the transmission probability function <f(H) to make the system 
state discrete. The optimization of the control policy is the similar solution 
path as the threshold approach. 
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power when the local CSI is good but if there is collision, 
the transmitted power is wasted. In this paper, we seek to find 
an optimal stationary control policy to minimize the average 
delays of the K competing users subject to average transmit 
power constraint for single user Specifically, the average delay 
for the /c-th user is 



Tk{-K) = limsup -!-E 

M All 



.M 



m—1 



Qk 



yke{l,...,K} (3) 



and average transmit power constraint is given by 

1 



Pfe(7r) = limsup — E 

M M 



k.ni 



(4) 



where Pk^m is the transmitted power determined by Tr{xk,m), 
and Pq is the average power constraint for single user. The 
delay-optimal control problem can be formally written as: 

Problem 1 (Delay Optimal S- ALOHA Control Policy): 
Find a stationary control policy tt that minimizes 



limsup — V , ]E[.gfe(xfc,m,7r(xfc,m))] 



(5) 



where 5fc(xfcjp, 7r(xfe,m)) = Qk,m + £.Pk,7n is the per-stage 
system price |j function and ^ > is the Lagrange multipliers 
corresponding to the average power constraints in (HJi. 

B. DEC-MDP Model 

Problem [T] in (|5]l in fact belongs to the class of infinite 
horizon DEC-MDP, which is formally defined below [11]: 

Definition 3 (DEC-MDP): An JC-agent DEC-MDP is given 
as a tuple 

{I.,S.,A,Pis'\s,a),Ris,a),po} 

where / = {1, .., K} is a set of agents, S = {Sk} is a finite 
set of states, A — {Ak} is a set of joint actions, Sk and Ak is 
available to agent k, P{s'\s, a) is the transition probability that 
transits from state s to s' given joint action a taken, R{s, a) 
is the price function given in state ,s and joint action a taken, 
Pq is the initial state distribution of the system @. 

The association between Problem [T] and DEC-MDP is as 
follows: We have Sk = Xk.m, a-k — tt, P{s'\s, a) can be easily 
obtained from local system state transition P{s'f^\sk^ ak) given 
in lemma[T] and R{s,a) = Y.k=i [9k{Xk,m,'^k{Xk,m))]- 

When the policy is given by a mapping from histories of 
local system state {sfc.i, ...Sfc,m, ...} to actions ak £ Ak, the 
problem is undecidableij [21]. When the policy is given by a 
mapping from current local system state Sk to actions ak £ 

^In [17], it is named price, yet called cost in [18]. If it is called a reward, 
then the problem is to maximize the rewai'd. 

'More details about the infinite horizon DEC-MDP is provided in [19] and 
the references therein. 

'Undecidability is a formal term in the computational complexity theory 
used to address the computability and complexity issue on decision problems. 
A decision problem is called (recursively) undecidable if no algorithm can 
decide it, such as for Turings halting problem. It has nothing to do with 
whether an optimal solution of an optimization problem exist or not (or have 
multiple solutions), because that depends fundamentally on the structure of 
the problem. Yet, even if an optimal solution of an undecidable problem exists 
theoretically, there is no algorithm (iterative) to obtain the optimal solution 
and terminates [20]. 



Ak, it is called memoryless or reactive policy. In that case, the 
problem is NP-hard [12], [13]. As a result, it is very difficult to 
obtain the optimal solution for the Problem[T] Instead of brute- 
force solution, we shall try to exploit the special structure of 
our problem to obtain low complexity solutions. 

IV. Delay-Optimal Control Problem in Symmetric 
Network 

In this section, we will focus on exploiting the special 
structure of the symmetric network. We shall first solve an 
optimal power control policy by a reduced state MDP for 
any given threshold control policy. To solve the threshold 
control problem, we utilize the collision channel mechanism 
and derive a low complexity solution. 

A. Embedded Markov Chain under a Given Threshold Control 
Policy 

For a given threshold control policy, the observed local 
system state for single user is actually evolved as a Markov 
chain. Specifically, the transition probability conditioned on 
the power control policy vrp is given in the following lemma. 

Lemma 1 (Transition Probability of Local System State): 
At TO-th slot, the current state of the fc-th user is 
Xk.m — {Qk,m, Hk,m-i,1m-i, Zm-1, Hk,m}- Conditioned 
on TTp, the transition probability to the next slot is given by: 

Pr{Xk,m+l\Xk,m,'n-piXk,m)} = I (7m = TT-^ (7m-l : ^m- 1 ) ) 
X PT{Hk,m+l\Hk,m}PT:{Zm\Zm-l,{Hk,t,-/t}YLm~l} 
X Pr{Q Zm,TTp{Xk.m)} 

(6) 

where 1{X) is an indicate function, which is equal to 1 when 
event X is true and otherwise. 

Proof: Please refer to appendix |A] ■ 

B. Reduced State MDP Formulation and Optimal Power Con- 
trol Policy 

For a given threshold control policy in (|5]), we seek to find 
an optimal power control policy to minimize 

J^-(Xl)=lim-^E. ^[9{Xk,,n,7Tp{xk,,n))] (7) 
M IVI ^-^k,m 

Note that, power control policy is a function of local system 
state, and for the fc-th user, its local system state transition 
probability is given in (|6]l. The optimal power control policy 
in (|7| could be decoupled into K single-user optimization 
problems, which can be modeled as a MDP and summarized 
as following lemma. 

Lemma 2 (Power Control Optimization for Single User): 
The optimal power control polic}0 minimizing the whole 
system delay can be modeled as a single user MDP 
problem, with state space given by local system state Xm 
(ignoring user index k). The transition probability is given by 

*The power action set is compact, due to finite transmit power in practice. 
By Theorem 8.4.7 in [17], there exists a stationary and deterministic pohcy 
that is average optimal. Thus, it is no loss of optimality for this power control 
policy. 
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Pr{xm+i|Xm,7rp(xm)} from lemma[Tl and average price is 
given by: 

J^-(Xi)-limi-^ E[5(x™,^p(Xm))] (8) 

For the infinite horizon MDP, the optimal policy can be 
obtained by solving the bellman equation recursively w.rt 
{e,{V{x)}) as below: 



(9) 



^(Xm) + = ini I .g(xm, a(Xm)) + 

a(Xm) L 

X] Pl'{Xm+l|Xm,a(Xm)}^^(Xm+l 



where a(xm) = T^piXm) is the power allocation when state is 
Xm- If there is a {9, {V^(x)}) satisfying (|9|l, then 9 is the op- 
timal average price per stage J'^^(xi) and the corresponding 
optimizing policy is given by a*(xm), the optimizing action 
of (|9]l at state Xm- 

Value or policy iteration can be used to solve the bellman 
equation (|9]l [17], [18]. The challenge of the two iteration 
algorithm lies in the size of the local state space. To re- 
duce the complexity, we shall recast the original MDP in 
lemma |2] into a reduced state MDP. Let's partition the policy 
Tip into a collection of actions, the above MDP could be 
further reduced to a simpler MDP over a reduced state 
Xm = {(9m,i?m-i,7m-i,Z„_i} onl>0. Specifically, we 
have following definition: 

Definition 4 (Conditional Action): Given a policy ttp, we 
define 7rp(xm) = {7rp(xm) : Xm = (Xm,^m)V-ffm} as 
the collection of actions under a given reduced state Xm 
for all possible current slot's CSI H„i. The policy vrp is 
therefore equal to the union of all conditional actions, i.e., 

TTp = Ux^P(x)- 

Taking conditional expectation (conditioned on x) on 
both sides of ©, and letting V{xra) = E[F(xm)|Xm] = 
^ Pr{i7,„|iJ„i_i}y(xm), the Bellman equation becomes: 



H„ 



V{xm) + 0= inf i V Pr{i?,„|iJ,„_i} ( 5(x™, a(xm)) 

+ X! P^{Xm+l\Xni,a{Xni)}V{x m+1 / I ( 



'- Pr{i7™|iy,„_i}.g(xm,a(xm)) 

aiXm.) '^m 

- ^ ^Pr{iJ„|iJ„_i}Pr{Xm+l|Xm7a(Xm)} 

Pr{i/™+i|ff™}F(xm+i, 



where a(xm) = T^piXm) is a single power allocation action 
at state Xm and a{xm) — T^piXm) is the collection of 
power allocation actions under a given reduced state Xm- 
Furthermore, g{xm, a(Xm)) is the conditional per-stage price 
function given by: 

5(Xm,a(Xm)) = E[g{Xm,H„i,a{Xm))\Xrn] (11) 

As a result, the original MDP is equivalent to a reduced 
state MDP, which is summarized in the following lemma. 

Lemma 3 (Equivalent MDP on a Reduced State Space): 
The original MDP in lemma |2] is equivalent to the following 
reduced state MDP with state space given by Xm, average 
price given by: 

J--(xi) = limsup — ^ E[g(x™,a(x„0)] (12) 

j\,f Ai ^ — 'm=\ 

Pr{xm+i|Xm, a(Xm)} is the states transition kernel given by: 

Pr{Xm+i|Xm,a(xm)} (13) 

The bellman equation for reduced state MDP is given in 
([Tol l. Note that while the reduced state MDP is defined over 
the partial state x, the power allocation is still a function of the 
original complete local system state. In fact, for realization of 
the reduced state Xm, the solution of the reduced MDP gives 
the conditional actions for different realization of 

C. Delay-Optimal Power Control Solution 

Value or policy iteration can be used to solve the bellman 
equation (fTOl i. and the convergence of the iteration algorithms 
is ensured by the following lemma. 

Lemma 4 (Decidability of the Unichain of Reduced State): 
The unichain of the reduced state MDP in lemma [3] is 
decidable under all power control policy. 

Proof Please refer to appendix |B] ■ 

The number of unichains of the reduced state MDP in 
(|3]l depends on the number of recurrent classes of local 
system state (excluding the queue state Q) in Xm, i.e., 
(E>„i — {Hi,ji,Zi}i^m-i- The value or policy iteration 
could be applied to different unichains respectively, while the 
convergence and unique solution is ensured [17]. Specifically, 
the bellman equation ( fTOl ) could be elaborated in an offline 
manner as follows: 



V{x. 



= inf 5(Xm,7I"p(Xm)) + 

7rp(Xm) 



= inf g(Xm,a(xm)) + 

a(Xm) t 

Pr{Xm+l|Xm,a(Xm)}F(Xm+l) 



similar tecliiiique was also used in [22], [23] 



(10) 



J2 Prj^m+il^m} rAF((g™ + l)/\Ar,$™+i)+ 

TiiV{{q,n - 1)+, + (1 - tA - r/i) V{q,n,<^m+i) 

(14) 

where iJ, = /^(Xmj ^m, 7rp(xm)) is the mean packet service 
rate in (O, a;/\ = min{a;,iV}, and let P{xm) = T^piXm)- 
In the right hand side of (fT4l l. P{xm) only influence and 

'"in [17], unichain is defined as a single recurrent class plus a possibly 
empty set of transient states. 
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g in ( fTTI ). Specifically, Pr{<I>m+i|*i>m} is presented simply 
as Pi{Hm\H,n-i}Pr{Zm\Zm-i}- Hcncc, the optimal power 
control policy for a system state Xm is thus given by; 



P{Xm) = arg min 

E PT{Hra\Hrn-l}kP{Xm) + = l\Zm-l} 



-WtFi{Z„, = l|Z„_i}5(g,„,i7„,7m)/ {Nt,^ln2) 



-NoW/H,, 

(15) 

where_(5(g„i,i/„i,7„i) = V((g„i - 1)+, i7„i, 7,„, Z„j = 
1) — V{qm, H„i,'jm, Z„i — 1). Note that the optimal power 
control action depends on the local CSI via the standard 
water-filling form. On the other hand, it also depends on 
the local QSI and common feedback Z through the water- 
leveini Using the optimal power allocation policy, the tran- 
sition probability of reduced state is Pr{xm+i|Xm} — 
Pi-{(5m+i|Xm,^m,7rp(xm)}Pr{*m+i|*m}- The Stationary 
distribution of x, denoted w(x), could be found by the 
linear equations uj{xj) = J2i^iXi)PHXj\Xi}- Finally, the 
Lagrange multiplier ^ is chosen to satisfy the average power 
constraint per user Pq: 



Po = oj{x„i) ^ Pr{H„r\H„ 



L}P(Xm) 



(16) 



D. Threshold Control Policy 

Threshold control policy is determined based on the com- 
mon information {7,„_i, Z^-i}. The full exploitation of 
the known information is critical to improve the delay per- 
formance of the system. In fact, the common information 
{7m_i,Zm_i} could be used to exploit the memory of all 
the K competing users' fading channels, and predict their 
transmission events at the current slot. Specifically, in the 
collision channel, data will be successfully received by the 
AP in the S -ALOHA network, if and only if exactly one user 
transmits at one slot. Consequently, the known information 
shall be chosen to ensure the user with the largest CSI 
will transmit alone with the highest probability. Based on 
this observation, we propose a larger CSI higher priority 
(LCSIHP) threshold control policy as follows: 



7„ 



(17) 



arg max Prjonly 1 user transmits 1 7,„_ i, Z„i_i} 



As a sanity check, when the CSI are i.i.d and the the control policies 
are not function of QSI (i.e., (it~,{H) : S — » S,iTp(H) : S —> R)), using 
similar reduced state MDP technique, the optimal power control policy is 
represented as: {-WT{Ylsi<-y i^i)'^''^ / (N^i^ri2)- NqW/ Hm) + ■ Where 
5 = {V{qm) — V{{qm, — 1)^))/? is the new Lagrange Multiplier, and 
considered as a constant since the QSI influence is ignored. Then optimal 
threshold 7 can be obtained. It is the same as the binary scheduling with 
power control w.r.t the CSI studied in the [5] called Variable-Rate Algorithms. 



where Prjonly 1 user transmits|7,„_i, Z^^i} is given by: 

-1, Zrn-l} 



Pr{only 1 user transmits |7„j- 



ifZ„ 
if Z„ 



-1 = 
-1 = 1 



{K,k) 



k)_ 



[K — k)C, vv 



(K-k-r 



if Zra-l = e 



where p^^''^' 



(18) 

given in (l30b , is a function of 7, and 
{u, U, C, (■} given in dZTl i (ignoring user index k) are functions 
of {7i}™ m-i- The way to obtain ([TtT i is to treat the previous 
slot's transmitted and non-transmitted users separately. As a 
sanity check, note that when the CSI are i.i.d, = v and 
C, = V for all {7„i_i, Z„i_i}, equation ( fTTI ) is reduced to 
7*j — argmax.^^ Kv (w)^'^^ ^\ 7*„ is the same for all the slot 
to maximize the probability that only one user will transmit. 

E. Summary of the Solution in Symmetric Network 

The overall power and threshold control solution in sym- 
metric network consists of an offline procedure and an onUne 
procedure and they are summarized below. 



Offline Procedure: The output of the offline procedure is 
optimal power allocation 7rp(x), which will be stored in a 
table and used in the online procedure. 

• Step 1) Determination of tlie thresliold control 
policy: Figure out the threshold control policy from 
( [TtI i for different realization of {7„i„i, 

. Step 2) Acquire unichains of reduced state: From 
the given threshold control policy, obtain the recurrent 
classes of the reduced state x from lemma |4l 

. Step 3) Determination of the optimal power 
control policy: For a given ^, determine 0{S), 
{V'((3„,, i7„i_i,7m_i,Z™_i;^)} of the bellman 
equation (fl4] i in every unichain of reduced state by 
policy or value iteration algorithm. The optimal power 
control policy iTp{xm]£.) is then determined in ( fTSl l. 

• Step 4) Transmit power constraint: For a given ^, 
the average transmit power Pq can be obtained in ( fT6] l. 
On the other hand, we could use root-finding numerical 
algorithm to determine ^ that satisfies a given Pq- 



Online procedure: The homogeneous users observe Xm = 
{Qm, Hjn-mni-i, Zjn-i, Hm}, the local system state re- 
alization at the beginning of the m-th slot and transmits 
at a power given by 7rp(xm)- If < 7r^(7m-i, ^m-i), 
Pm — 7rp(xm) = 0, i.e., the user will not transmit. 



The complexity of the online procedure is negligible be- 
cause it is simply a table looking up. The complexity of the 
offline procedure depends mostly on the solution of power 
control policy, which contains an iteration algorithm to solve 
the bellman equation in ( fT4b . Specifically, the complexity of 
the reduced state MDP is given in following theorem. 

Theorem 1 (Complexity of the Reduced State MDP): The 
worst case complexity of the reduced state MDP is 0{f{K)), 
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where f{K) is a monotonic decreasing function of number 
of users K. Furthermore, there exists a constant Kq > such 
that for all K > Kq, the complexity is reduced to 0{NJ). 
Proof: Please refer to Appendix |C] ■ 
Theorem [T] implies that when K is large enough, there is 
no need to exploit the memory of the fading channels. The 
threshold is fixed to S,j regardless of the common feedback. 
This is reasonable because the more competing users we have, 
the smaller the chance for single user to transmit. Hence, for 
sufficiently large K, the users are only allowed to transmit 
when local CSI reaches the largest state Sj, so as to reduce 
the intensive collision. Note that, the complexity of the offline 
procedure is substantially reduced, compared to the complexity 
0{N,P) of the brute-force solution in the original MDP in 
lemma 12] 

V. Extension to Asymmetric Network 

In this section, we shall extend the delay control frame- 
work to asymmetric S-ALOHA network, in which heteroge- 
nous users have different fading channels. Specifically, let 
Sk ~ {'S'i}f=i denote a set of Jk CSI states, p^' denote 
the state transition probability and -k^ denote the stationary 
probability for user k. The common threshold for all users 
is not applicable for the heterogenous users and hence, the 
system threshold 7,„ is extended to r,„ — {lk.m]k=i^ where 
7fc „, is the threshold for user k. As a result, the threshold 
control policy is extended to Tm — 7rr(r,„_i, Z^-i), and 
power control policy for user k is denoted as 'r^p^{xk,m)- 
The set of joint control policy tt = {ttf, {T^Pk^k^i) is easily 
redefined as in section HI] 

A. Optimal Power Control Policy under a Given Threshold 
Control Policy 

For a given threshold control policy. Lemma [T] still holds. 
Due to the extension of single threshold 7„i to system thresh- 
old Tm, the transition probability of local system state of the 
fc-th user should be rewritten as: 

Pr{Xfc,m+i|Xfc,m,7rpJxfc,m)} = (19) 
i(r ))Vr{Hk,r,^+l\Hk,r,^] 

xPr{Z„|Z™_i,{fffe,„r,}™„_i} 
X Pr{(3 7rp,(xfe,m)} 

where the transition probability of the feedback state Z is 
not as simple as the symmetric case shown in appendix lA-AI 
For instance, the memory of channel fading of other [K — 1) 
users should also be exploited through the known information 
,m — 1 ; ^m— 1 •) Z,n-i} of user k. Hence, the joint 
probability of CSI for other users at m-th slot is given by: 



n Pr{iJ,,™_i}/ E n, Pr{i/,.„,_i} is the belief of 

the possible realization of H_fc conditioned on 'i'k,m-i- 
Then, the feedback transition probability is given by: 

PT{Z^\Zra-l,{HkMT=r,^-l} ^ 

|*fc.m-l,*fc.m} (21) 

^ — 'H_fc „, 

As a result, the power control solution ttp^. (xfc,m) for the k- 
th user is similar to lemma [3] except that transition probability 
of feedback state Zm is replaced by ( 1211 1. 

B. Threshold Control Policy 

The system threshold determined by the threshold 
control policy will influence the successful transmission prob- 
ability of each user. Specifically, let ak^m(Pm) represent the 
probability that user k transmits alone at m-th slot, i.e., 
ak,ni = Pr |-H"fe,m > lk,m.,[ji-tk Hi-m < 7i/m|- Notc fliat an 
increase in ak,m for user k will result in a decrease in ai^m 
for all i ^ k. Hence, there is a tradeoff relationship among the 
probability of successful transmission of the K users. Unlike 
the symmetric case, the threshold control policy shall not only 
improve the delay performance, but also consider the fairness 
among the K heterogenous users. In S-ALOHA network [6] 
and centralized system [24], the authors proposed the product- 
optimization form to take the fairness into consideration. 
Similarly, we consider a system threshold control policy that 
maximizes the product-probability: 



PT{U^k^rn\'^k.m-l} 



(20) 



^ Pr{H_fc,„,_i|*fc,„_i} [ II Pr{i?,,„,|F,,„,_i} J 

where Ji^k,m = {Hi^rn]f=i^i^k is the set 
of all users' CSI at the m-th slot, excluding 
the fc-th one, and Pr{H_fc^„_i|^'fc,™_i} = 



(22) 



The product-maximization could prevent users from having 
very low successful transmission probability. Similar to the 
symmetric case, we shall exploit the common information 
{r„j_i, to enhance the probability of successful trans- 

mission of K competing users over a collision channel. Given 
all the transmission event {B^ (defined in definition 

ID at the previous slot, the probability that user k transmits 
alone at current slot is given by: 

*^fc,m (Pmi Pm — 1 1 {-^z,m— 1 } i— 1 ) 
^T:{Ak,m\lk,m, lk,m-l-, Bk^rn-l] 

Bi,m-l} (23) 

i^k 

Substituting into ( |22] |. r*„ could be decoupled into single user 
optimization problem, i.e., 

7fc,„ = argmaxPr{ylfe,„j7fc,m,7fc,„_i,Bfc,„i_i} 

Bk,m-l}) (24) 

From {r,„_i, Zjn-i}, we can calculate the probability that 
any specific user transmitted before and hence, the threshold 
control policy can be solved by single user optimization 
problem given by: 

,K-1 



Ik. 



argmaxufc (1 - Vk) 
argmax " \ J ^k-i 

-tk.m +PkQk (1 - C/cj 



if 



if 



-1=0 

(25) 
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where {u/cCfe} ^re obtained in ( |27] i. pk — 
Pi'{^fe,m-i|rm-i! ^m-i} IS the Conditional probability that 
user k transmits at the (m — l)-th slot, and — [\ — pk). 
Hence, we have 

f Y{,^k ~lJ J2k n»5^fc Vi if Zm-1 = 1 

Pk = < mii-n.^,v.) y _ (26) 

where ^k = Pr{Afe^,„_i|7fc^„_i} = I]sj>7fc,„_i is the 
transmission probability of user fc, given the threshold is 

7fc,m_i, and % = 1 - r/^.. 

C. Summary of the Solution in Asymmetric Network 

The overall solution of the control policy in asymmetric 
network also consists of an offline procedure and an online 
procedure. Compared with the symmetric case, the optimal 
power control policy ttp^{x) is not the same for all the 
heterogenous users. In the offline procedure, np^ (x) should be 
calculated and stored in corresponding user's table for online 
looking up. 

Similarly, the online procedure is a table looking up and 
hence, the complexity is negligible. Since the threshold control 
policy is decoupled to one dimensional optimization problem 
for single user, the complexity of the offline procedure still 
depends mostly on the iteration algorithm. Due to the ex- 
tension of the system threshold, the number of reduced state 
Xk.m is 0{NY[k Jk)- However, theorem [T] still holds in the 
asymmetric network. For sufficiently large K, the threshold 
control policy will increase the threshold of each user so as to 
avoid intensive collision. As a result, the number of possible T 
states is substantially reduced and the asymptotic complexity 
of user k becomes 0{NJk) as in the symmetric case. 

VI. Numerical Results and Discussions 

In this section, we shall illustrate the delay performance of 
the proposed control policy via numerical simulations. We set 
the time of a slot r ~ 1ms, bandwidth W = IKHz. We model 
the packet arrival and CSI event follows the assumption in the 
system model (SectionHJi. With different simulation scenarios, 
we calculate the optimal policies in offline. In the online 
application, the users simply implement the policy at each slot 
corresponding to the system state observed in that slot. The 
packet will stay in the buffer until it is successfully serviced, 
and the performance is evaluated with sufficient realizations. 

FiglH-FigH] compares the LCSIHP threshold control policy 
(corresponding optimal power control policy) in symmetric 
network with three reference baselines. Baseline 1 corresponds 
to the binary scheduling algorithm in [6]. Baseline 2 corre- 
sponds to the LCSIHP threshold control policy without power 
control. Baseline 3 corresponds to the variable-rate algorithm 
with power control proposed in [5]. We observe that there 
is a significant gain in both delay and throughput of the 
proposed policy over these three baselines. Fig|4] compares 
packet dropping probability (packet arrives when the buffer is 
full Q — N). It shows that packet dropping performance is 
also improved by the proposed policy. This scenario can also 
be inferred from the optimal power control policy, which will 
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Fig. 2. Comparison of tlie delay performance between proposed control 
policy and three baselines in symmetric network, with I''* CSI model in 
Table U for all the homogeneous users. We assume that the buffer length 
N = 5, packet arrival rate A = 1 for all A' = 5 users, with mean packet size 
Tfi = IK bits. 
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Fig. 3. Comparison of the throughput performance between proposed control 
policy and three baselines in symmetric network. The configuration is the same 
as Fig[2] 
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Fig. 4. Comparison of Packet Dropping Probability (Conditioned on Packet 
Arrival) between proposed control poHcy and three baselines in symmetric 
network. The configuration is the same as Fig|2] 
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TABLE I 

Two FSMC CSI Models with the Same States yet Different Transition Probability (User1/User2) 





Hi 


H2 


H3 


H4 


H5 


He, 


Hr 


^^8 


Hg 


Hio 


States 


0.055 


0.074 


0.112 


0.153 


0.237 


0.531 


0.894 


1.343 


2.588 


4.493 


Hi 


0.2/0.25 


0.8/0.75 


























H2 


0.2/0.25 


0.3/0.3 


0.5/0.45 























H-i 





0.25/0.3 


0.35/0.35 


0.4/0.35 




















Ha 








0.3/0.34 


0.3/0.3 


0.4/0.36 

















H5 


rt 
U 


u 


u 


n 11 /r\ 11 
U.3 j/U.3 / 


U.34/U. 34 




U 


u 


U 


U 


He 














0.33/0.37 


0.34/0.34 


0.33/0.29 











Hr 

















0.4/0.36 


0.3/0.3 


0.3/0.34 








Ha 




















0.4/0.35 


0.35/0.35 


0.25/0.3 





Hg 























0.5/0.45 


0.3/0.3 


0.2/0.25 


Hio 


























0.8/0.75 


0.2/0.25 




0.0137 


0.0548 


0.1097 


0.1463 


0.1755 


0.1755 


0.1463 


0.1097 


0.0548 


0.0137 




0.0342 


0.1027 


0.154 


0.1586 


0.1529 


0.1201 


0.0979 


0.0951 


0.0634 


0.0211 



BSP-user2 
BSP-nelwork 



- BSP-user2{Baseline) 
BSP-network(Baseline) 
Asymm-user2 

Asymm-nelwork 

- - Asymm-userl 

- BSP-user1 (Baseline) 




Asymm-user2 

Asymm-network 

Asymm-userl 




Average Power P|,(dB) 



Group 3^ 



Average Power P^{dB} 



Fig. 5. Comparison of delay performance between BSP (power control 
w.rt CSI additionally) and proposed Asymm policy in asymmetric network 
with two heterogenous users, and their CSI models are listed in Table 
U (userl/user2). Specifically, BSP-user2 denotes the delay performance for 
user 2 under BSP policy, while BSP-userl is denoted for user 1. BSP- 
network denotes the average delay performance of the two users under BSP 
policy. CoiTespondingly, the notation started with Asymm denotes the delay 
performance under Asymm policy. 



potentially put more power on the node with larger QSI to 
reduce the delay. 

FigH compares the delay performance in asymmetric net- 
work for two heterogenous users. The mean packet arrival 
rate is assumed to be A = 2. Other settings are the same 
as the symmetric case. We compare the performance of the 
proposed scheme in section [V] (denoted Asymm) with another 
baseline scheme designed for heterogeneous users in [6]. 
Specifically, we consider power control w.r.t CSI under the 
binary scheduling scheme in [6] to form a competitive baseline 
(namely BSP). Observe there is discontinuity in the delay 
performance of BSP, and this is because in small SNR regime, 
the system threshold for user 2 is lower than that of user 
1 but they become the same in large SNR regime. Observe 
that the proposed scheme has significant performance gain in 
terms of fairness or delay performance compared with the BSP 
baseline. 

Fig IS] compares the delay performance in a larger asym- 



Fig. 6. Comparison of delay perfoiTnance between BSP (power control w.rt 
CSI additionally) and proposed Asymm policy in asymmetric network with 
10 heterogenous users. Every group has two homogeneous users. 



metric network. There are 10 heterogeneous users which 
are divided into 5 groups. In each group, there are two 
homogeneous users. Furthermore, we assume a larger buffer 
size N = 10, and A — 0.4. It can be observed that in a 
larger network, the fairness improvement is less obvious. This 
is because the threshold is increased to avoid the intensive 
collision both under Asymm or BSP policy, and the freedom 
of the improvement for Asymm policy is reduced. However, 
the delay performance is obviously guaranteed due to the 
additional dimension in QSI for the Asymm policy. 

Fig|7]compares the delay performance of the random access 
channel with capture effeco- We set /3 = 0.9 to leave margin 

'^In our original formulation, we have set the ti'ansmit data rate according to 
the instantaneous mutual information of the channel, i.e., Ri^ = VFlog2(l + 
■^j^) (see ^2))- As a result, the transmitted packet could be decoded only 
when there is exactly one user transmits.^In order to allow for jiossibility 
of capture, we set the data rate to be ijj. = /3VFlog2(l + ^^^^^ ) in 
the simulation, where /3 < 1. As a result, we leave some margin in the 
transmit data rate so that when there is collision, the transmit data rate may 
still be smaller than the instantaneous mutual information Cfc (collision) = 
VFlog2(l+ ^^TT+'NqW") packet detection is possible. The criteria 

to determine the success of capture is based on comparing the ij^ and 
Cfc (collision). If < (collision), then the packet from the fc-th user 
can be successfully decoded. Otherwise, it will be corrupted. 
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Fig. 7. Compai'ison of the delay perfomiance in 10 users symmetric 
network with capture effect at the AP. Specifically, we consider narrow band 
transmission (W = IKHz). The data rate is given by ij^ = /3VFlog2(l + 

and packet arrival rate 
When collision occurs, 



^^^), where /3 = 0.9, buffer length N = 10, 
A = 0.4, with mean packet size A^f, 



IK bits. 

the packet sent by the fc-th user will be successfully detected at the AP if 



Rk ^ Cfc (collision) : 
be corrupted. 



W/log2(l + 



-Pt Hi, 



). Otherwise, it will 



for the possibility of capture in case of collision. It can be 
observed that there is significant performance gain of the 
proposed scheme when there is capture. 

VII. Summary 

We considered delay-sensitive transmit power and threshold 
control design in S-ALOHA network. The users adaptively 
adjust their transmission threshold and power, to achieve the 
minimal delay of the network. The jointly optimal policy is 
revealed to be computationally intractable and hence brute 
force solution is simply infeasible. However, for a given 
threshold control poUcy, we decompose the optimal power 
control policy into a reduced state MDP for single user, in 
which the overall complexity is 0{NJ). Threshold control 
policy is proposed by exploiting the special structure of the 
collision channel and the common feedback to derive a low 
complexity solution, which is a one dimensional optimization 
problem in symmetric and asymmetric networks. The delay 
performance of the proposed design is illustrated to have sub- 
stantial gain relative to conventional random access approaches 
in both networks. 

Appendix A 

Proof of Lemma[T1 Transition Probability of 
Local System State 

Note that the transition event is from Xk,m to Xk.m+i = 
{Qk,r,i+i,Hk,m,7m,Zm,Hk^,n+i}- Specifically, the system 
threshold 7™ is given by the threshold control policy, i.e., 
7m = 7r^(7„i-i, Z,„_i) with certainty, and Pr{Hk,,n+i = 
Sj\Hk.m — Si] — pi,j, independent of other states. The 
transition probability of feedback and queue state is given 
below. 



A. Feedback State Transition 

From the position of user k, common feedback Z„i-i 
and {Hk,m-i,1m-i} could provide the information how 
many other {K — 1) users have transmitted at the previ- 
ous slot. It can be ultilized to improve the prediction of 
their transmission behavior at current slot. Moreover, whether 
user k transmits at current slot will influence the realiza- 
tion of Z„i and hence, the feedback transition is deter- 
mined only by {Zm-i, {^ffc,i, 7i}™ m-il- Next we shall find 
PT{Z^\Zm-l,{Hk,i,l^}T=m-l} (denote Pr{Z™|Z„_i} for 
simplicity) given in 

In fact, the common feedback information could modify the 
stationary probability of CSI states. For instance, Z,n-i ~ 
is equal to Ufe ^fc,™ < 7fe,m-i- Given Hk.„i < 7. the 
stationary probability Pr{Hk m — Sj} should be modified 

^^(7) — \^ — ' — k- Similarly, Given Hkm > J, the 

: Sj} should be modified 



stationary probability Pr{i/fc , 



as 7f*^(7) 



Specifically, we introduce following 
is the threshold for k-th 



definition for user k, where ^k,r 
user, utilized in section IVl 

Definition 5 (Transmission Event of the k-th User): Let 
Ak.m denote the event that user k attempts to transmit at 
the TO-th slot, i.e., Hk^m > lk,m, while Ak^m denote the 
complimentary event, i.e., Hk^m < Ik.m- Furthermore, let 

As a result, the probability of the transmission event is given 
by: 



vk^ E 



1, Bk,m-l} — 
Cfe = E E ^?(7fc,m- 



L)p,^ ,,if Bk,, 



^iK,if B. 



k,m- 



~1 — Ak.m—1 
-1 = Ak^m-l 



For simplicity, let Ufe = 1 — Vk, and (^j. ^ 1 — (k 



(27) 
Note that. 



in symmetric network, IJj, Vk — v and IJj, Qk — C- Therefore, 
we ignore the user index k in the symmetric network. 

• Feedback transits from Z,„_i — : All the other 
[K — 1) users did not transmit at the previous slot, and 
transition probability is given by: 



v^-H{Ak,m) 







if Z^ = 
if Z„, = 1 



I(^fc,m) 



if Z„i = e 



{K - l)vv'^-H{Ak 
(l-u^-i)l(Afe,„0 + 
(1 -z7^-i -{K - l)^jv''^ 

(28) 

• Feedback transits from Zm-i = 1: Only one user's 
CSI exceeded 7m_i at the previous slot, which could be 
divided into two cases. 



If Hi 



A;.m — 1 



> 



7/c,»i 



{Ak,. 



happens), all the other 



users did not transmit at the previous slot. The CSI 
information of other users are the same as Zm-i = 
case, so the transition probability is Pr{Z„i\Z.,n-i = 

l,Ak^„i-l} = Pl{Zrri\Zm-l = 0}. 

If Hk^m-1 < 7m-i (^fe,m-i happens), only one of other 
users transmitted at the previous slot. Then the transition 
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probability is given by: 



Prj^ml^m-l — ^, Ak.m-l} — 

CT;^-2l(Afe,„)+ 



if = 

if Z„, = 1 



(29) 



Feedback transits from Zm-i — e: At least two users 
transmitted at the previous slot, which should also be 
divided into two cases. We first find the probability of ex- 
act users involved in the transmission. Specifically, given 
threshold 7, the probability that k out K users will trans- 
mit is = (^) tt,) (Es,<j^j) 
Given additional information that at least n users will 
transmit, the probability is improved as 



7,n 



/!L,=o (l-l^f''M.Vfc>n (30) 



B. Queue State Transition 

The queue state transition is correlated with the feedback. 
For instance, if Zm 7^ 1, the probability of decreased queue 
state should be zero, because of no successful data receival. 
To obtain simple solution, we consider the case the same as 
[22], where the time slot duration r is substantially smaller 
than the average packet inter-arrival time and average packet 
service time ^ (t ^ j and t <C ^), where /i is the average 
packet service rate defined later. 

• Packet arrival: Since packet arrival follows Poisson 
distribution with mean arrival rate A, the transition prob- 
ability of the queue state related to packet arrival is given 
by: 

Pq,q+l = Pr{(3fc,m+1 = 9 + MQkjn = q} = At (33) 

• Packet departure: The packet length follows exponential 
distribution with mean packet size Ni,, so the packet 
service time also follows exponential distribution. Con- 
ditioned on the state {xk,m, Zm) and data rate given in 
(|2]|, the mean packet service rate is: 



If Hk,m-i > Im-i (Ak,,n-i happens), at least one of 
other users transmitted, i.e.. 



Prj^ml-Z^m-l — e,^fe,m-l} 
/ K-1 



t pfr'"'^ (c V-i-^) i(3,.„o if z 



= 

if Z„r = 1 



—K-l-k 



+C (A' - 1 - k)vv 



K-l 



—K-k-2^ 



k=l 



1-C V 



—K-k-1 



l{Ak,ra) 



if Zr, 



+ (1 - c 

~C\K-l-k)vv''-''-^)l{Ak^,n)} 

(31) 

If Hk,m-i < 7m- 1 (Ak^rn-1 happens), at least two of 
other users transmitted, i.e.. 



f^^{Zm\Zra-l — G, Ak^m-l} — 



KXk 7I'p(Xfc,m)) 

1 /I I PmHk.m ry T\ 



(34) 



k=2 
K-1 



1^ ^7,2 i C V l{Ak,m) + 

{kct^'v''-^-'^+ 

C\k - 1 - k)vv''-'^-^)l(Ak,n)} 
k=2 



if 



if Zm = 1 



1-C V 



■K-k-l 



+ ( 1 - C*"^^"^-' - kct'^v'^-^-'' 



if Z„ 



C {K — 1 — k)vv 



K-k-2 



)l(^fc.r„)} 



(32) 



where Pk,m = Trp{Xk,m) is the power transmitted at 
current slot determined by power control policy. Further- 
more, Zm 7^ 1 will lead to zero service rate. Another 
case leads to zero service rate is Hk,m < 1k,m, in which 
the power control policy will set Pfc,m = 0. Hence, the 
probability for packet departure is given by: 

Pq,q-1 = 

Pr{Qfc,m+l = {q - ^VlQkjm = q,Xk,m, Zm,'^piXk,m)} 
= tJ-iQk,m = q, Xk,m, Zm, p{Xk.m))T 

(35) 

• No change in the fc-th user: The transition probability 
corresponding to no change in queue state is given by: 

^9,9 = 

Pr{<9fc,m+1 = q\Qk,m = q,Xk,m, Zm,TTp{Xk,m)} 
= (1 -Pq,q-'L -Pq,q+l) 

(36) 

Since At ^ 1 and fir ^ 1, the probability of multiple 
packet arrivals or packet departures is negligible and hence 
Pq.p — for |p — (/I > 1. Thus the transition probability 
of queue state is given by Pi-{Qk,r,i+i\Xk,m, Zm,Trp{xk,rn)}, 
which completes the proof. 

Appendix B 

Proof of Lemma|4] Decidability of the Unichain of 
Reduced State 

Denote the state (excluding Q) in % as $ = {H, 7, Z), 
whose transition probability has been given in lemma [l] 
independent of Q and power control policy. Specifically, 

Pr{$^+l|$„,} = Pr{Hm\Hm-l}PT{Zm\^m,Hm,lm}, 

where 7^ is determined from the given threshold control 
policy. Then, the recurrent classes of <!> could be found. 
Furthermore, the queue state evolves as a birth-death process 
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under every power control policy, forming an unichain itself. 
As a result, the unichain of the reduced state x = (Q,^) is 
decidable. 

Appendix C 

Proof of Theorem[T1 Complexity of the Reduced 
State MDP 

{Ck,Vk} in (|27|) are functions of {jm~i,Jm}- Specifically, 
we assume Cfc > Vk for the same {7™- 1,7™}. This is a 
practical assumption for fading channels, because the CSI 
states will not change fast [14]. Then, we have following 
lemma about the threshold control policy 7*^ in (fTTI l. 

Lemma 5 (Monotonic Increasing Function of ^*-^ w.r.t K): 
Given {7m-i,^m-i}, if K2 > Ki, 

K2) > j;„,{'yrn-i,Zra-i,Ki). Specifically, 
if Imilm-i, Zm-i,Ki) < 7™_i, then for a sufficiently large 

Proof: Given {'jm-i, Z„i-i}, 7m just influence the 
{C, V, C, v} parameter in ( fTTI l, and from ( |27] l, {(, v} are mono- 
tonic decreasing ({C,,v} are monotonic increasing) functions 
of 7„j. As a result, lemma |5] is obvious when Zm-\ 7^ e. If 
Zra-\ — e, when K\ is increased to K2, by comparing each 
term of the same fc case and in additional fc — {K\ + !)••• K2 
case, using the assumption of C > -u for the same {7m-i, 7m}, 
the monotonic increasing characteristic is also obvious. ■ 

As the reduced state is x = {QtHtI^Z^, the worst case 
complexity is corresponding to the total number of states of x, 
i.e., 0{NJ'^). On the other hand, since the QSI and CSI states 
are recurrent, the least number of states in a recurrent class 
is 0{NJ). Next we will show that the number of states of 
the system threshold 7 decreases as K increases, and j ~ Sj 
regardless of the feedback when K is large enough, which 
completes the proof. 

Given Ki, let 7inin(^i) be the minimal threshold in 
a recurrent reduced state class. Specifically, 7min(-^i) = 
7m(7Xi,^Xi,^i), where > lmin{Ki). By lemma |5] 
for a sufficiendy large K2 > Ki, jmhi{K2) > 7min(-f''i) and 
hence, the minimal threshold in the recurrent class is increased. 
Following the argument, the minimal threshold will increase 
to the largest CSI state Sj, when K is increased to a large 
number Kq. 
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